Impact of audio segmentation and segment clustering on automated transcription accuracy of large spoken archives
نویسندگان
چکیده
This paper addresses the influence of audio segmentation and segment clustering on automatic transcription accuracy for large spoken archives. The work forms part of the ongoing MALACH project, which is developing advanced techniques for supporting access to the world’s largest digital archive of video oral histories collected in many languages from over 52000 survivors and witnesses of the Holocaust. We present several audio-only and audio-visual segmentation schemes, including two novel schemes: the first is iterative and audio-only, the second uses audio-visual synchrony. Unlike most previous work, we evaluate these schemes in terms of their impact upon recognition accuracy. Results on English interviews show the automatic segmentation schemes give performance comparable to (exhorbitantly expensive and impractically lengthy) manual segmentation when using a single pass decoding strategy based on speaker-independent models. However, when using a multiple pass decoding strategy with adaptation, results are sensitive to both initial audio segmentation and the scheme for clustering segments prior to adaptation: the combination of our best automatic segmentation and clustering scheme has an error rate 8% worse (relative) to manual audio segmentation and clustering due to the occurrence of “speaker-impure” segments.
منابع مشابه
Segment Generation and Clustering in the HTK Broadcast News Transcription System
This paper describes the segmentation, gender detection and segment clustering scheme used in the 1997 HTK broadcast news evaluation system and presents results on both the unpartitioned 1996 development and the 1997 evaluation sets. The stages of our approach are presented, namely classification, segmentation and gender detection, gender relabelling, and clustering of speech segments. The eval...
متن کاملEfficient Access to Lecture Audio Archives through Spoken Language Processing
The paper firstly addresses the current state of speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effec...
متن کاملHigh Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation
Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...
متن کاملImage Segmentation: Type–2 Fuzzy Possibilistic C-Mean Clustering Approach
Image segmentation is an essential issue in image description and classification. Currently, in many real applications, segmentation is still mainly manual or strongly supervised by a human expert, which makes it irreproducible and deteriorating. Moreover, there are many uncertainties and vagueness in images, which crisp clustering and even Type-1 fuzzy clustering could not handle. Hence, Type-...
متن کاملVoting for two speaker
The process of locating the end points of each speakers voice in an audio file and then clustering segments based in speaker identity is called speaker segmentation. In this paper we present a method for two speaker segmentation, though it can be extended to more than two speakers. Most methods for speaker segmentation and clustering start with an initial computationally inexpensive speaker seg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003